LanceDB
Overview
LanceDB is a vector database for multi-modal AI applications. Qarbine interfaces with LanceDB Cloud, the DBaaS offering. More information can be found at https://www.lancedb.com. Qarbine supports native LanceDB vector query interactions. For example, the specification below
{
table: 'youtubeVectors',
nearText: "dracula" ,
limit: 5
}
can return the answer set below.
The structure of the first element is shown to the right.
Defining a Data Source
Overview
A Data Source is a Qarbine component responsible for retrieving data from somewhere. At a high level it has a name, a description and some arbitrary query string which when sent to the associated Qarbine Data Service endpoint returns some data. The overall execution flow for an analysis, including the optional prompt component, is shown below.
A single data source can be referenced by name from multiple Qarbine template components. This enables a single point of change when perhaps, an index is added, or some other query tweak is necessary. The alternative is to attempt to find all templates impacted by a schema or index change for example. This component reusability is especially beneficial when team members have varying roles and skills.
Query Language
Qarbine provides 2 options for specifying the LanceDB retrieval:
- JSON specification and
- SQL.
Complete details on these options are described in the LanceDB Data Interactions document.
Example
The data source below retrieves up to 5 movies that are “similar” to the “dracula” phrase.
select *
from youtubeVectors
where nearVector( [! embeddings("dracula") !] )
order by start
limit 5
The above has the same effect as
select *
from youtubeVectors
where nearText("dracula")
order by start
limit 5
The difference is that in the first the embeddings are retrieved by the client and inserted into the final query which is then run on the Qarbine server. In the second the embeddings are retrieved on the server side and there the query is run as well.
A sample result element is shown in the section above.
Managing Answer Set Size
The default maximum number of rows starts off at 25 for a new data source. This is useful to evolve a query from a concept to one that you have verified returns the desired answer set. As noted, any native way of limiting an answer set size is the preferred approach. This setting is in the component dialog as shown below and also accessible by clicking the ‘Gear’ icon.
Once you are done drafting you can adjust this parameter. A “0” indicates there is no maximum. A number greater than 0 indicates to limit the final answer set size to that number of rows. This answer set truncation comes after any native query limit. So, if the answer set from the data endpoint is quite large, that content has to be returned to the Qarbine host. It then may truncate the number of rows. It is best to truncate at the query level (i.e., use a limit) to reduce the content sent from the data endpoint to the Qarbine host in the first place.
Adjusting the Maximum Rows
Recall the default maximum rows at the component level is 25. When you are satisfied with your query you can change that setting by clicking.
Adjust the setting to “0” indicating no Qarbine answer set truncation.
Click
Prompt Integration
Overview
Qarbine prompts provide a way to obtain runtime values and variables for data source and template execution. To avoid hardcoding, prompts can use macro formulas to run queries which populate list widgets. Prompts are defined in a no code manner using the Prompt Designer. Shown below is the execution flow when there is a Prompt component.
The Prompt Designer supports a large variety of input widgets including entry fields, check boxes, radio button groups, sliders, and file input.
Example
Let’s define a Qarbine Prompt component to obtain the userInput variable value to apply in the Data Source. This will soon be leveraged from a Qarbine Template. The Prompt Designer is basically a no-code dialog builder. In this example we are only asking the user for a single value. Qarbine prompts can ask for many values and present entry fields, lists, checkboxes, radio buttons and other widgets.
The running prompt is shown below.
The Qarbine prompt component has 2 elements.
The first element is defined as
Notice the image URL can be a macro language expression and not just a simple string. The second element is defined as
The component is saved in the Qarbine catalog and can be referenced by data sources and analysis templates.
Defining an Analysis Template
Overview
A template defines how to process the data being retrieved from Data Source queries and other data expressions. It also defines formulas, formatting options, and other analysis and presentation options. The overall execution flow for an analysis, including the optional prompt component, is shown below
Qarbine provides an extremely large set of formatting and data interaction functionality to produce publication quality, interactive analytics. There are over 450 Excel-like macro functions to apply when defining templates. These include aggregation functions such as max, min, avg, sum, count, etc.
Using the Template Designer
The Template Designer tools integrates features leveraging Microsoft Word formatting, Excel formulas, and PowerPoint layout concepts. The template defines how to iterate over the retrieved data, apply formulas, and present the results. The results can be publication quality reports with interactive end user options as well.
The result of running the about to be described template is shown below.
It presents Youtube videos from the sample youtubeVectors index based on an end user provided phrase. This sample data has just a few fields and none of them have nested content. Qarbine handles any data shape, even very dynamic ones all within the same answer set.
The template’s primary properties are shown below.
. . .
It uses the Data Source defined previously. It references the Prompt as shown below.
The general cell layout is shown below.
The right hand side of the Template Designer will show any meta data about the data source data. (There must be no cell chosen in the grid area for this to appear).
This is a fairly simple template. The first body line uses the following cells.
The second body line has the following cells.
The button cell definition is shown below.
The second cell definition is shown below.
Running this Prompt first presents the dialog into which the user types into the text area.
Clicking OK propagates that variable value into the template execution flow. As described above, the “dracula movies” userPhrase value flows to the Qarbine backend. Qarbine retrieves the vector for the “dracula movies” text using the configured AI Assistant. That value is placed into the LanceDB query’s vector field argument. The LanceDB query is sent to the LanceDB database and the results are sent back to the Qarbine backend. This data is then processed based on the template definition. The result is then shown to the user.
Next Steps
Accessing Your Database
To configure access to your database see the guides at
http://doc.qarbine.com/docs/category/data-service-configuration
Querying Your Database
For database specific interaction guides navigate to
http://doc.qarbine.com/docs/category/data-source-designer
References
Information on LanceDB queries can be found at https://lancedb.github.io/lancedb/sql/.